{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from mlxtend.frequent_patterns import apriori\n",
    "from mlxtend.frequent_patterns import association_rules, fpgrowth, fpmax"
   ]
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 1 构造数据\n",
    "构造一个购物数据"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 1.1 购物明细数据"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "outputs": [
    {
     "data": {
      "text/plain": "   id          item\n0   0          Eggs\n1   0  Kidney Beans\n2   0          Milk\n3   0        Nutmeg\n4   0         Onion\n5   0        Yogurt\n6   1          Dill\n7   1          Eggs\n8   1  Kidney Beans\n9   1        Nutmeg",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>id</th>\n      <th>item</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>0</td>\n      <td>Eggs</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>0</td>\n      <td>Kidney Beans</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>0</td>\n      <td>Milk</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>0</td>\n      <td>Nutmeg</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>0</td>\n      <td>Onion</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>0</td>\n      <td>Yogurt</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>1</td>\n      <td>Dill</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>1</td>\n      <td>Eggs</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>1</td>\n      <td>Kidney Beans</td>\n    </tr>\n    <tr>\n      <th>9</th>\n      <td>1</td>\n      <td>Nutmeg</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 明细数据\n",
    "shopping_detail = pd.DataFrame(\n",
    "    [\n",
    "        [0, 'Eggs'],\n",
    "        [0, 'Kidney Beans'],\n",
    "        [0, 'Milk'],\n",
    "        [0, 'Nutmeg'],\n",
    "        [0, 'Onion'],\n",
    "        [0, 'Yogurt'],\n",
    "        [1, 'Dill'],\n",
    "        [1, 'Eggs'],\n",
    "        [1, 'Kidney Beans'],\n",
    "        [1, 'Nutmeg'],\n",
    "        [0, 'Onion'],\n",
    "        [0, 'Yogurt'],\n",
    "        [1, 'Dill'],\n",
    "        [1, 'Eggs'],\n",
    "        [1, 'Kidney Beans'],\n",
    "        [1, 'Nutmeg'],\n",
    "        [1, 'Onion'],\n",
    "        [1, 'Yogurt'],\n",
    "        [2, 'Apple'],\n",
    "        [2, 'Eggs'],\n",
    "        [2, 'Kidney Beans'],\n",
    "        [2, 'Milk'],\n",
    "        [3, 'Corn'],\n",
    "        [3, 'Kidney Beans'],\n",
    "        [3, 'Milk'],\n",
    "        [3, 'Unicorn'],\n",
    "        [3, 'Yogurt'],\n",
    "        [4, 'Corn'],\n",
    "        [4, 'Eggs'],\n",
    "        [4, 'Ice cream'],\n",
    "        [4, 'Kidney Beans'],\n",
    "        [4, 'Onion']],\n",
    "    columns=['id', 'item']\n",
    ")\n",
    "shopping_detail.head(10)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "outputs": [],
   "source": [
    "# 物品清单\n",
    "all_items_list = [\"Apple\", \"Corn\", \"Dill\", \"Eggs\", \"Ice cream\", \"Kidney Beans\", \"Milk\", \"Nutmeg\", \"Onion\", \"Unicorn\", \"Yogurt\"]"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 1.2 数据encode为bitmap形式"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "outputs": [
    {
     "data": {
      "text/plain": "    item_Apple  item_Corn  item_Dill  item_Eggs  item_Ice cream  \\\nid                                                                \n0        False      False      False       True           False   \n1        False      False       True       True           False   \n2         True      False      False       True           False   \n3        False       True      False      False           False   \n4        False       True      False       True            True   \n\n    item_Kidney Beans  item_Milk  item_Nutmeg  item_Onion  item_Unicorn  \\\nid                                                                        \n0                True       True         True        True         False   \n1                True      False         True        True         False   \n2                True       True        False       False         False   \n3                True       True        False       False          True   \n4                True      False        False        True         False   \n\n    item_Yogurt  \nid               \n0          True  \n1          True  \n2         False  \n3          True  \n4         False  ",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>item_Apple</th>\n      <th>item_Corn</th>\n      <th>item_Dill</th>\n      <th>item_Eggs</th>\n      <th>item_Ice cream</th>\n      <th>item_Kidney Beans</th>\n      <th>item_Milk</th>\n      <th>item_Nutmeg</th>\n      <th>item_Onion</th>\n      <th>item_Unicorn</th>\n      <th>item_Yogurt</th>\n    </tr>\n    <tr>\n      <th>id</th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>True</td>\n      <td>False</td>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n      <td>False</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>False</td>\n      <td>False</td>\n      <td>True</td>\n      <td>True</td>\n      <td>False</td>\n      <td>True</td>\n      <td>False</td>\n      <td>True</td>\n      <td>True</td>\n      <td>False</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n      <td>True</td>\n      <td>False</td>\n      <td>True</td>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>False</td>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>True</td>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n      <td>True</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>False</td>\n      <td>True</td>\n      <td>False</td>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "onehot_dt = pd.get_dummies(shopping_detail, columns=['item'], sparse=True)\n",
    "bitmap_dt = onehot_dt.groupby(by='id').apply(lambda x: x.filter(regex='item_.+$').sum() > 0)\n",
    "bitmap_dt"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 2 挖掘频繁项"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 2.1 FPMax\n",
    "参考: http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/fpmax/"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "outputs": [
    {
     "data": {
      "text/plain": "   support   itemsets\n0      0.6     (5, 6)\n1      0.6  (8, 3, 5)\n2      0.6    (10, 5)",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>support</th>\n      <th>itemsets</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>0.6</td>\n      <td>(5, 6)</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>0.6</td>\n      <td>(8, 3, 5)</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>0.6</td>\n      <td>(10, 5)</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_fp_max_freq_items = fpmax(bitmap_dt, min_support=0.6)\n",
    "df_fp_max_freq_items"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "outputs": [
    {
     "data": {
      "text/plain": "   support                                    itemsets\n0      0.6              (item_Kidney Beans, item_Milk)\n1      0.6  (item_Kidney Beans, item_Onion, item_Eggs)\n2      0.6            (item_Yogurt, item_Kidney Beans)",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>support</th>\n      <th>itemsets</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>0.6</td>\n      <td>(item_Kidney Beans, item_Milk)</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>0.6</td>\n      <td>(item_Kidney Beans, item_Onion, item_Eggs)</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>0.6</td>\n      <td>(item_Yogurt, item_Kidney Beans)</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_fp_max_freq_items_col = fpmax(bitmap_dt, min_support=0.6, use_colnames=True)\n",
    "df_fp_max_freq_items_col"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 2.2 apriori\n",
    "参考: http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "outputs": [
    {
     "data": {
      "text/plain": "    support   itemsets\n0       0.8        (3)\n1       1.0        (5)\n2       0.6        (6)\n3       0.6        (8)\n4       0.6       (10)\n5       0.8     (3, 5)\n6       0.6     (8, 3)\n7       0.6     (5, 6)\n8       0.6     (8, 5)\n9       0.6    (10, 5)\n10      0.6  (8, 3, 5)",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>support</th>\n      <th>itemsets</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>0.8</td>\n      <td>(3)</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1.0</td>\n      <td>(5)</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>0.6</td>\n      <td>(6)</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>0.6</td>\n      <td>(8)</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>0.6</td>\n      <td>(10)</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>0.8</td>\n      <td>(3, 5)</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>0.6</td>\n      <td>(8, 3)</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>0.6</td>\n      <td>(5, 6)</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>0.6</td>\n      <td>(8, 5)</td>\n    </tr>\n    <tr>\n      <th>9</th>\n      <td>0.6</td>\n      <td>(10, 5)</td>\n    </tr>\n    <tr>\n      <th>10</th>\n      <td>0.6</td>\n      <td>(8, 3, 5)</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_freq_items = apriori(bitmap_dt, min_support=0.6)\n",
    "df_freq_items"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "outputs": [
    {
     "data": {
      "text/plain": "    support                                    itemsets\n0       0.8                                 (item_Eggs)\n1       1.0                         (item_Kidney Beans)\n2       0.6                                 (item_Milk)\n3       0.6                                (item_Onion)\n4       0.6                               (item_Yogurt)\n5       0.8              (item_Kidney Beans, item_Eggs)\n6       0.6                     (item_Onion, item_Eggs)\n7       0.6              (item_Kidney Beans, item_Milk)\n8       0.6             (item_Kidney Beans, item_Onion)\n9       0.6            (item_Yogurt, item_Kidney Beans)\n10      0.6  (item_Kidney Beans, item_Onion, item_Eggs)",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>support</th>\n      <th>itemsets</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>0.8</td>\n      <td>(item_Eggs)</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1.0</td>\n      <td>(item_Kidney Beans)</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>0.6</td>\n      <td>(item_Milk)</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>0.6</td>\n      <td>(item_Onion)</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>0.6</td>\n      <td>(item_Yogurt)</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>0.8</td>\n      <td>(item_Kidney Beans, item_Eggs)</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>0.6</td>\n      <td>(item_Onion, item_Eggs)</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>0.6</td>\n      <td>(item_Kidney Beans, item_Milk)</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>0.6</td>\n      <td>(item_Kidney Beans, item_Onion)</td>\n    </tr>\n    <tr>\n      <th>9</th>\n      <td>0.6</td>\n      <td>(item_Yogurt, item_Kidney Beans)</td>\n    </tr>\n    <tr>\n      <th>10</th>\n      <td>0.6</td>\n      <td>(item_Kidney Beans, item_Onion, item_Eggs)</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_freq_items_col = apriori(bitmap_dt, min_support=0.6, use_colnames=True)\n",
    "df_freq_items_col"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 2.3 fpgrowth\n",
    "参考: http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/fpgrowth/\n"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "outputs": [
    {
     "data": {
      "text/plain": "    support   itemsets\n0       1.0        (5)\n1       0.8        (3)\n2       0.6       (10)\n3       0.6        (8)\n4       0.6        (6)\n5       0.8     (3, 5)\n6       0.6    (10, 5)\n7       0.6     (8, 3)\n8       0.6     (8, 5)\n9       0.6  (8, 3, 5)\n10      0.6     (5, 6)",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>support</th>\n      <th>itemsets</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1.0</td>\n      <td>(5)</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>0.8</td>\n      <td>(3)</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>0.6</td>\n      <td>(10)</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>0.6</td>\n      <td>(8)</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>0.6</td>\n      <td>(6)</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>0.8</td>\n      <td>(3, 5)</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>0.6</td>\n      <td>(10, 5)</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>0.6</td>\n      <td>(8, 3)</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>0.6</td>\n      <td>(8, 5)</td>\n    </tr>\n    <tr>\n      <th>9</th>\n      <td>0.6</td>\n      <td>(8, 3, 5)</td>\n    </tr>\n    <tr>\n      <th>10</th>\n      <td>0.6</td>\n      <td>(5, 6)</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_fp_freq_items = fpgrowth(bitmap_dt, min_support=0.6)\n",
    "df_fp_freq_items"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "outputs": [
    {
     "data": {
      "text/plain": "    support                                    itemsets\n0       1.0                         (item_Kidney Beans)\n1       0.8                                 (item_Eggs)\n2       0.6                               (item_Yogurt)\n3       0.6                                (item_Onion)\n4       0.6                                 (item_Milk)\n5       0.8              (item_Kidney Beans, item_Eggs)\n6       0.6            (item_Yogurt, item_Kidney Beans)\n7       0.6                     (item_Onion, item_Eggs)\n8       0.6             (item_Kidney Beans, item_Onion)\n9       0.6  (item_Kidney Beans, item_Onion, item_Eggs)\n10      0.6              (item_Kidney Beans, item_Milk)",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>support</th>\n      <th>itemsets</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1.0</td>\n      <td>(item_Kidney Beans)</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>0.8</td>\n      <td>(item_Eggs)</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>0.6</td>\n      <td>(item_Yogurt)</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>0.6</td>\n      <td>(item_Onion)</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>0.6</td>\n      <td>(item_Milk)</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>0.8</td>\n      <td>(item_Kidney Beans, item_Eggs)</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>0.6</td>\n      <td>(item_Yogurt, item_Kidney Beans)</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>0.6</td>\n      <td>(item_Onion, item_Eggs)</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>0.6</td>\n      <td>(item_Kidney Beans, item_Onion)</td>\n    </tr>\n    <tr>\n      <th>9</th>\n      <td>0.6</td>\n      <td>(item_Kidney Beans, item_Onion, item_Eggs)</td>\n    </tr>\n    <tr>\n      <th>10</th>\n      <td>0.6</td>\n      <td>(item_Kidney Beans, item_Milk)</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_fp_freq_items_col = fpgrowth(bitmap_dt, min_support=0.6, use_colnames=True)\n",
    "df_fp_freq_items_col"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 3 association_rules挖掘关联规则\n",
    "\n",
    "- 输入: 上述频繁项集, 需要包含support和itemsets\n",
    "\n",
    "- 计算方式可以基于: support、confidence、lift、leverage、conviction"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "outputs": [
    {
     "data": {
      "text/plain": "                       antecedents                      consequents  \\\n0                     (item_Onion)                      (item_Eggs)   \n1                      (item_Eggs)                     (item_Onion)   \n2  (item_Kidney Beans, item_Onion)                      (item_Eggs)   \n3   (item_Kidney Beans, item_Eggs)                     (item_Onion)   \n4                     (item_Onion)   (item_Kidney Beans, item_Eggs)   \n5                      (item_Eggs)  (item_Kidney Beans, item_Onion)   \n\n   antecedent support  consequent support  support  confidence  lift  \\\n0                 0.6                 0.8      0.6        1.00  1.25   \n1                 0.8                 0.6      0.6        0.75  1.25   \n2                 0.6                 0.8      0.6        1.00  1.25   \n3                 0.8                 0.6      0.6        0.75  1.25   \n4                 0.6                 0.8      0.6        1.00  1.25   \n5                 0.8                 0.6      0.6        0.75  1.25   \n\n   leverage  conviction  \n0      0.12         inf  \n1      0.12         1.6  \n2      0.12         inf  \n3      0.12         1.6  \n4      0.12         inf  \n5      0.12         1.6  ",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>antecedents</th>\n      <th>consequents</th>\n      <th>antecedent support</th>\n      <th>consequent support</th>\n      <th>support</th>\n      <th>confidence</th>\n      <th>lift</th>\n      <th>leverage</th>\n      <th>conviction</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>(item_Onion)</td>\n      <td>(item_Eggs)</td>\n      <td>0.6</td>\n      <td>0.8</td>\n      <td>0.6</td>\n      <td>1.00</td>\n      <td>1.25</td>\n      <td>0.12</td>\n      <td>inf</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>(item_Eggs)</td>\n      <td>(item_Onion)</td>\n      <td>0.8</td>\n      <td>0.6</td>\n      <td>0.6</td>\n      <td>0.75</td>\n      <td>1.25</td>\n      <td>0.12</td>\n      <td>1.6</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>(item_Kidney Beans, item_Onion)</td>\n      <td>(item_Eggs)</td>\n      <td>0.6</td>\n      <td>0.8</td>\n      <td>0.6</td>\n      <td>1.00</td>\n      <td>1.25</td>\n      <td>0.12</td>\n      <td>inf</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>(item_Kidney Beans, item_Eggs)</td>\n      <td>(item_Onion)</td>\n      <td>0.8</td>\n      <td>0.6</td>\n      <td>0.6</td>\n      <td>0.75</td>\n      <td>1.25</td>\n      <td>0.12</td>\n      <td>1.6</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>(item_Onion)</td>\n      <td>(item_Kidney Beans, item_Eggs)</td>\n      <td>0.6</td>\n      <td>0.8</td>\n      <td>0.6</td>\n      <td>1.00</td>\n      <td>1.25</td>\n      <td>0.12</td>\n      <td>inf</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>(item_Eggs)</td>\n      <td>(item_Kidney Beans, item_Onion)</td>\n      <td>0.8</td>\n      <td>0.6</td>\n      <td>0.6</td>\n      <td>0.75</td>\n      <td>1.25</td>\n      <td>0.12</td>\n      <td>1.6</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_associate_rules = association_rules(df_freq_items_col, min_threshold=0.1, metric=\"leverage\")\n",
    "df_associate_rules"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "outputs": [
    {
     "data": {
      "text/plain": "                       antecedents                     consequents  \\\n0              (item_Kidney Beans)                     (item_Eggs)   \n1                      (item_Eggs)             (item_Kidney Beans)   \n2                    (item_Yogurt)             (item_Kidney Beans)   \n3                     (item_Onion)                     (item_Eggs)   \n4                     (item_Onion)             (item_Kidney Beans)   \n5  (item_Kidney Beans, item_Onion)                     (item_Eggs)   \n6          (item_Onion, item_Eggs)             (item_Kidney Beans)   \n7                     (item_Onion)  (item_Kidney Beans, item_Eggs)   \n8                      (item_Milk)             (item_Kidney Beans)   \n\n   antecedent support  consequent support  support  confidence  lift  \\\n0                 1.0                 0.8      0.8         0.8  1.00   \n1                 0.8                 1.0      0.8         1.0  1.00   \n2                 0.6                 1.0      0.6         1.0  1.00   \n3                 0.6                 0.8      0.6         1.0  1.25   \n4                 0.6                 1.0      0.6         1.0  1.00   \n5                 0.6                 0.8      0.6         1.0  1.25   \n6                 0.6                 1.0      0.6         1.0  1.00   \n7                 0.6                 0.8      0.6         1.0  1.25   \n8                 0.6                 1.0      0.6         1.0  1.00   \n\n   leverage  conviction  \n0      0.00         1.0  \n1      0.00         inf  \n2      0.00         inf  \n3      0.12         inf  \n4      0.00         inf  \n5      0.12         inf  \n6      0.00         inf  \n7      0.12         inf  \n8      0.00         inf  ",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>antecedents</th>\n      <th>consequents</th>\n      <th>antecedent support</th>\n      <th>consequent support</th>\n      <th>support</th>\n      <th>confidence</th>\n      <th>lift</th>\n      <th>leverage</th>\n      <th>conviction</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>(item_Kidney Beans)</td>\n      <td>(item_Eggs)</td>\n      <td>1.0</td>\n      <td>0.8</td>\n      <td>0.8</td>\n      <td>0.8</td>\n      <td>1.00</td>\n      <td>0.00</td>\n      <td>1.0</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>(item_Eggs)</td>\n      <td>(item_Kidney Beans)</td>\n      <td>0.8</td>\n      <td>1.0</td>\n      <td>0.8</td>\n      <td>1.0</td>\n      <td>1.00</td>\n      <td>0.00</td>\n      <td>inf</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>(item_Yogurt)</td>\n      <td>(item_Kidney Beans)</td>\n      <td>0.6</td>\n      <td>1.0</td>\n      <td>0.6</td>\n      <td>1.0</td>\n      <td>1.00</td>\n      <td>0.00</td>\n      <td>inf</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>(item_Onion)</td>\n      <td>(item_Eggs)</td>\n      <td>0.6</td>\n      <td>0.8</td>\n      <td>0.6</td>\n      <td>1.0</td>\n      <td>1.25</td>\n      <td>0.12</td>\n      <td>inf</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>(item_Onion)</td>\n      <td>(item_Kidney Beans)</td>\n      <td>0.6</td>\n      <td>1.0</td>\n      <td>0.6</td>\n      <td>1.0</td>\n      <td>1.00</td>\n      <td>0.00</td>\n      <td>inf</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>(item_Kidney Beans, item_Onion)</td>\n      <td>(item_Eggs)</td>\n      <td>0.6</td>\n      <td>0.8</td>\n      <td>0.6</td>\n      <td>1.0</td>\n      <td>1.25</td>\n      <td>0.12</td>\n      <td>inf</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>(item_Onion, item_Eggs)</td>\n      <td>(item_Kidney Beans)</td>\n      <td>0.6</td>\n      <td>1.0</td>\n      <td>0.6</td>\n      <td>1.0</td>\n      <td>1.00</td>\n      <td>0.00</td>\n      <td>inf</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>(item_Onion)</td>\n      <td>(item_Kidney Beans, item_Eggs)</td>\n      <td>0.6</td>\n      <td>0.8</td>\n      <td>0.6</td>\n      <td>1.0</td>\n      <td>1.25</td>\n      <td>0.12</td>\n      <td>inf</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>(item_Milk)</td>\n      <td>(item_Kidney Beans)</td>\n      <td>0.6</td>\n      <td>1.0</td>\n      <td>0.6</td>\n      <td>1.0</td>\n      <td>1.00</td>\n      <td>0.00</td>\n      <td>inf</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_associate_rules_by_confidence = association_rules(df_fp_freq_items_col, min_threshold=0.8, metric=\"confidence\")\n",
    "df_associate_rules_by_confidence"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "下面是一些调节参数的技巧：\n",
    "- Support：\n",
    "支持度是指一个规则在数据集中出现的频率。\n",
    "在确定支持度的阈值时，应该考虑数据集的大小和规则的复杂度。如果数据集很大，支持度阈值可以设置得较小，以便发现更多的规则。如果数据集较小，阈值应该设置得较高，以便避免过多的无意义规则。\n",
    "\n",
    "- Confidence：\n",
    "置信度是指当条件项出现时，结果项也会出现的概率。\n",
    "置信度的阈值应该根据具体情况来确定。如果要找到比较强的规则，阈值应该设置得高一些。如果希望发现更多的规则，则应该设置得低一些。\n",
    "\n",
    "- Lift：\n",
    "提升度是指条件项和结果项之间的关系与它们之间的随机关系之比。\n",
    "当提升度大于1时，说明条件项和结果项之间存在正相关关系；当提升度小于1时，说明它们之间存在负相关关系；当提升度等于1时，说明它们之间没有关系。可以通过设置提升度的阈值来筛选规则。\n",
    "\n",
    "- Leverage：\n",
    "杠杆值是指条件项和结果项同时出现的频率与它们分别独立出现的频率之差。\n",
    "杠杆值的绝对值越大，说明条件项和结果项之间的关联程度越高。可以通过设置杠杆值的阈值来筛选规则。\n",
    "\n",
    "- Conviction：\n",
    "确信度是指条件项和结果项之间的独立性度量，即结果项不受条件项影响的概率与实际情况下结果项不受条件项影响的概率之比。\n",
    "当确信度大于1时，说明条件项和结果项之间的关系是真实存在的；当确信度小于1时，说明它们之间存在相反的关系。可以通过设置确信度的阈值来筛选规则。"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}